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A  Global  Convergence  Theory 
for  Arbitrary  Norm  Trust- Region  Methods 
for  Nonlinear  Equations1,2,3 

M.  El  Hallabi4  R.A.  Tapia5 


Abstract.  In  this  work  we  extend  the  Levenberg-Marquardt  algorithm  for  approxi¬ 
mating  zeros  of  the  nonlinear  system  F(x)  —  0,  where  F  :  IRn  — *•  IR"  is  continuously 
differentiable.  Instead  of  the  1 2  norm,  arbitrary  norms  can  be  used  in  the  trust-region  ob¬ 
jective  function  and  in  the  trust-region  constraint.  The  algorithm  is  shown  to  be  globally 
convergent.  This  research  was  motivated  by  the  recent  work  of  Duff,  Nocedal  and  Reid. 
A  key  point  in  our  analysis  is  that  the  tools  from  nonsmooth  analysis  and  the  Zangwill 
convergence  theory  allow  us  to  establish  essentially  the  same  properties  for  an  arbitrary 
norm  trust-region  algorithm  that  have  been  established  for  the  Levenberg-Marquardt 
algorithm  using  the  tools  from  smooth  optimization.  It  is  shown  that  all  members  of  this 
class  of  algorithms  locally  reduce  to  Newton’s  method  and  that  the  iteration  sequence 
actually  converges  to  a  solution. 

Key  Words:  trust  region,  Newton’s  method,  global  convergence,  superlinear  conver¬ 
gence,  quadratic  convergence,  nonlinear  systems. 
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1.  Introduction 

In  this  paper  we  consider  the  problem  of  solving  the  nonlinear  system  of  equations 

F(x)  =  0,  (1) 

where  F  :  IR"  —*  IR"  is  a  continuously  differentiable  function.  We  will  be  concerned  with  the  fact  that  the 
Jacobian  of  F  at  x ,  say  F'(x),  may  be  sparse. 

Locally,  problem  (1)  is  often  solved  by  Newton’s  method.  Globally  difficulties  arise  when  the  Newton 
step,  sN  =  —  [F'(a:fc)]_1F(xfc),  lies  outside  the  region  where  the  linear  model  F(xk)  +  F'(xk)s  is  a  good 
approximation  to  F(x k  +  s).  One  effective  remedy  when  this  occurs  is  to  restrict  the  step  s  to  a  region  where 
the  linear  model  can  be  trusted.  The  classical  approach  for  accomplishing  this  objective  is  the  well-known 
Levenberg-Marquardt  trust-region  algorithms  where  the  step  sk  is  the  solution  of  the  subproblem 


minimize  ||F(x*)  +  F'(a:*)s||2  (2a) 

subject  to  ||s||2  <  <§f.  (2b) 

The  Karush-Kuhn-Tucker  conditions  for  Problem  (2)  are  equivalent  to 

«(A 0  =  -  [F\xk)T  F\xk t)  +  ul]~l  F\xk)TF(xk)  (3a) 

^>0,  ||s(p)|||<4,  and  (||«(^)||1  —  =  0  (3b) 


The  solution  of  Problem  (2)  is  s(vk)  where  vk  satisfies  ||s(^jt)|||  =  6k,  unless  ||s(0) |||  <  6k,  in  which  case 
s(0)  =  sf ,  i.e.  the  Newton  step  is  the  solution  of  Problem  (2).  It  can  be  obtained  by  the  robust  Hebden- 
More  implementation  of  the  Levenberg-Marquardt  algorithm  described  in  More  [17].  (2),  these  conditions 
are  both  necessary  and  sufficient.  However,  for  larger  systems,  this  approach  has  the  disadvantage  that  (3) 
has  to  be  evaluated  for  several  values  of  v  at  each  iteration.  Also  it  is  not  obvious  how  one  utilizes  sparsity 
here;  since  multiplying  a  matrix  by  its  transpose  may  destroy  sparsity. 

To  avoid  solving  (3)  at  each  iteration,  the  dogleg  (Powell  [18])  or  the  double  dogleg  (Dennis  and  Mei  [5]) 
can  be  used  to  obtain  a  good  approximation  to  the  solution  of  Problem  (2).  However,  we  cannot  expect 
the  dogleg  strategies  to  be  as  robust  as  the  Levenberg-Marquardt  algorithm.  In  fact,  Reid  [20]  adapted  the 
dogleg  method  to  the  sparse  case,  and  reported  finding  examples  for  which  the  method  did  not  converge, 
but  the  standard  Levenberg-Marquardt  method  did  converge. 

Duff,  Nocedal,  and  Reid  [7]  suggested  replacing  the  square  of  the  £2-norm  in  (2a)  and  (2b)  with  the 
G-norm  in  (2a),  and  the  £<*, -norm  in  (2b).  The  resulting  trust-region  subproblem,  in  a  standard  manner, 
can  be  reformulated  as  a  linear  program,  and  hence,  unlike  the  Levenberg-Marquardt  approach,  it  is  possible 
to  take  advantage  of  any  sparsity  patterns  in  the  Jacobian  F'(xk).  Since  /  =  ||F||i  is  not  differentiable, 
Duff,  Nocedal  and  Reid  use 

\\F(x  +  s)||i  <  ||F(x)||1  -  c0||F'(x)s||, .  (4) 

as  an  acceptance  criterion.  It  will  be  shown  in  Lemma  2.2,  that 

-||F'(x)s||I  < /'(x,s),  (5) 

which  implies  that  the  descent  condition  (4)  may  be  excessively  conservative.  Duff,  Nocedal,  and  Reid  do 
not  include  convergence  results.  However,  they  do  give  a  detailed  description  of  their  algorithm  and  its 
implementation  and  point  out  that  it  is  competitive  with  other  methods. 
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The  use  of  a  different  norm  in  (2a)  instead  of  the  square  of  the  £2  norm  and  various  alternatives  to  (2b), 
has  been  suggested  and  investigated  by  many  authors.  Madsen  [15]  uses  the  4o-norm  and  considers  the 
overdetermined  system  F(x)  =  0  where  F  :  IRn  — >  IRm,  n  <  m.  Powell  [19]  also  considered  a  trust-region 
algorithm  for  minimizing  /i(F(x)),  where  F  :  IRn  — >  IRm,  n  <  m,  is  continuously  differentiable  and  h  is  any 
coercive  continuous  convex  function.  Both  algorithms  in  Madsen  [15]  and  in  Powell  [19]  are  demonstrated 
to  be  globally  convergent  in  the  sense  that 

lim  ip(xk)  =  0 
fc— ►  4-00 

where 

ip(x)  —  h(F(x))  -  min  {h(F(x)  +  F'(x)s)  |  ||s||  <  1}  . 

In  a  similar  approach  to  Powell  [19],  Yuan  [21]  and  [22]  uses  the  very  simple  descent  condition 

h(F(xk+i))  <  h(F(x *,)). 

He  proves  that  liminfj,_>+ix,’0(2;j,)  =  0. 

In  the  present  work,  we  propose  a  class  of  globally  convergent  trust-region  algorithms  for  approximating 
zeros  of  the  square  nonlinear  system  (1).  At  each  iteration,  we  solve  the  following  model  trust-region  problem: 

minimize  mk(s)  -  || F(xk)  +  F'(xk)s\\a  (6a) 

subject  to  ||s||j  <  6k,  (6b) 

where  ||  ||a  and  ||  ||j,  are  two  arbitrary  but  fixed  norms  on  IRn. 

In  Section  2  we  compare  differentiability  properties  of  the  function  /  =  |)F||  and  the  local  model  mx(s)  = 
||F(x)  +  F'(x)s||.  We  also  derive  a  rather  weak  sufficient  condition  for  stationary  points  to  be  solutions  of 
the  nonlinear  system  F(x)  =  0.  The  General  Trust-Region  Algorithm  is  described  in  Section  3.  In  Section 
4  we  extend  to  arbitrary  trust-region  algorithms  the  well-known  result  that  the  solution  of  the  Levenberg- 
Marquardt  model  trust-region  problem  (2)  approaches  a  steepest  descent  direction  as  the  trust-region  radius 
approaches  zero. 

Since  in  our  analysis  we  will  consider  iterates  having  the  form  (xk,6k)  where  xk  and  6k  will  not  be 
uniquely  specified,  we  choose  to  model  our  algorithm  with  a  point-to-set  map.  Therefore  in  Section  5  we 
review  some  properties  of  point-to-set  maps  and  Zangwill’s  convergence  theorem  [23]. 

The  bulk  of  our  analysis  is  contained  in  Section  6  where  we  demonstrate  that  the  General  Trust-Region 
Algorithm  is  globally  convergent.  In  Section  7  we  establish  that  either  all  accumulation  points  of  the 
sequence  generated  by  the  General  Trust-Region  Algorithm  are  solutions  of  F(x)  =  0  or  no  linear  system 
F(x„)  +  F'(x*)s  =  0,  where  x*  is  arbitrary  accumulation  point  of  the  sequence,  has  a  solution.  We  then 
use  a  theorem  of  Eisenst.at  and  Walker  [8]  to  show  that  the  General  Trust  Region  Algorithm  converges  to 
a  solution  of  F(x)  =  0  whenever  the  iteration  sequence  has  an  accumulation  point  x»  such  that  F'(x*)  is 
nonsingular.  The  ^-quadratic  convergence  of  the  algorithm  is  demonstrated  in  Section  8  by  proving  that 
the  General  Trust-Region  Algorithm  reduces  to  Newton’s  method  after  a  finite  number  of  steps.  Finally,  in 
Section  9  we  present  a  summary  and  some  concluding  remarks. 

2.  Differentiability  of  /  =  ||F||  and  Optimality  Conditions 

In  this  section,  we  present  subdifferentiability  properties  of  /  =  ||F||,  where  F  :  IRn  — *■  ]Rn  is  continuously 
differentiable.  These  properties  are  needed  to  derive  the  optimality  conditions  and  to  characterize  the 
solutions  of  F(x)  =  0. 
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The  locally  Lipshitz  function  /  is  regular,  i.e.  its  one-sided  directional  and  generalized  directional  deriva¬ 
tives  at  x  £  IR"  in  the  direction  s  £  IR",  denoted  f'{x\s )  and  f°(x]s)  respectively,  exist  and  are  equal  They 
are  defined  respectively  by 


and 


r(I;8)  =  iim«fL±tfWW. 

V  '  uo  t 


/°(x;  s)  =  lim  sup 


f{y  +  ts)  -  f(y) 


y—*x 

no 


(7) 

(8) 


Moreover  the  generalized  gradient  of  /  at  x,  denoted  df(x),  is  the  subset  of  IR”  defined  by 


df(x)  =  {ge  nt"|/°(x;s)  >  gTs,  Vs  £  IR"  }  . 


(9) 


We  refer  to  [3]  for  more  details  about  subdifferentiability  properties. 

The  following  lemma  shows  that  the  local  model  mx  and  the  function  /  have  the  same  descent  directions. 
This  is  important  from  an  algorithmic  point  of  view. 

Lemma  2.1.  Let  x  and  s  be  any  points  in  IR",  F  :  IR"  — ►  IR"  a  continuously  differentiable  function  at 
x,  and  f  =  ||F||.  Then 

f  {x\  s)  =  m'x(0;  s),  (10) 

where 

mj;(s)  =  \\F(x)  +  F'(x)s\\.  (U) 


Proof.  Because  F  is  differentiable  at  x,  we  have 


F(x  +  ts)  =  F(x)  +  tF'(x)s  +  o(t) 


where 


lim  2*11  =  0. 

(->0  t 


Using  the  triangle  inequality,  we  establish  that 


mx(ts)  -  f(x)  _  o(t)  <  f(x  +  ts)  -  f(x)  mx{ts)  -  f(x)  o(t) 
t  t  ~  t  t  +  t 

and  by  taking  the  limit  as  t  decreases  to  zero,  we  obtain  (10).  □ 

The  following  lemmas  suggest  that  an  approximation  of  the  directional  derivative,  say  j,  that  can  be 
used  in  a  relaxed  descent  condition  test  should  satisfy 


max  {  /'(x;  s),  —  [  mt(s)  -  /(x)  ]  }  <  j(x,  s).  (12) 

co 

They  also  demonstrate  the  conservatism  of  the  choice  (4). 

Lemma  2.2.  Let  x  be  any  point  in  IR",  F  :  IR"  — >  IR"  a  continuously  differentiable  function  at  x,  and 


/=  ||F||.  Then 

f(x-,s)<\\F(x)  +  F'(x)s\\-\\F(x)\\ 

(IS  a) 

and 

-ll^'Wsll  <  /'(*;*) 

(13b) 
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for  all  s  E  IRT  Moreover  if  the  linear  system. 


F(x)  +  F'(x)s  =  0  (If  a) 

has  a  solution  s*,  then 

f'(x,s*)  =  -\\F(x)\\.  ( lfb ) 


Proof.  The  inequality  (13a)  is  a  consequence  of  the  convexity  of  the  function  mx  and  Lemma  2.1.  On 
the  other  hand,  we  have  for  all  s  6  IRn  and  for  all  positive  t 

\\\F(x)  +  tF'(x)s\\-\\F(x)\\\<t\\F'(x)s\\, 
which,  together  with  the  definition  of  m„  implies 

—  \\F,(x)s\\  <  m«(*s)~m«(°)  v  t  >  Q 

By  passing  to  the  limit  as  t  decreases  to  zero  and  using  Lemma  2.1  we  obtain  (13b).  We  now  suppose  that 
the  linear  system  (14a)  has  a  solution  s„.  Then  the  inequalities  (13a)  and  (13b)  become 

/'(x;  s*)  <  -||F(ai)||  and  -  \\F'(x)st\\  <  f  (x;  st). 


The  result  now  follows  from  the  equality  ||T(a:)||  =  ||F/(a:)s*||.  □ 

ihe  property  in  equality  (14b)  is  also  proved  in  Burdakov  [2]  for  the  special  case  where  s*  is  the  Newton 
direction,  i.e.  s*  =  — [F/(a;)]_1T(a:)  and  also  for  any  norm. 

Lemma  2.3.  Assume  the  hypotheses  of  Lemma  2.2.  Let  {sk  yl  0,k  E  IN)  be  a  sequence  that  converges 
to  zero.  If  d  is  an  accumulation  point  of  {dk  =  sjb/||sfc||,  k  E  IN}  such  that  f(x,d)  <  0  and  0  <  c0  <  Cj  <  1, 
then 

—  [mx{sk)  -  f(x)]  <  /'(x,s »,),  (15a) 

holds  for  sufficiently  large  k  E  N. 

Proof.  From  the  Lipschitz  continuity  of  mx  and  Lemma  2.1  we  obtain 


lim 

k  £N—i ►  + oo 


mx(sk)  -  f(x) 

IMI 


f\x,d). 


(15b) 


Since  f'{x,  d)  <  0  and  0  <  co  <  ci,  the  continuity  of  f'(x,  •)  and  (15b)  imply  that 


f(x,  sk)  >  —  ^(sfc)  -  f(x)].  (15c) 

Co 

The  algorithmic  implication  of  the  following  lemma  is  very  important  as  will  be  seen  in  Lemma  6.1. 
Observe  that  the  choice  (4)  would  not  allow  us  to  establish  this  result. 

Lemma  2.4.  Assume  the  hypotheses  of  Lemma  2.2.  Let  {s*  ^  0,fc  G  IN }  be  a  sequence  that  converges 
to  zero  and  satisfies 

f(x  +  sk)  >  f(x)  +  c0y(x,  sk)  (16) 

where  0  <  Co  <  1  and  y  satisfies  (12).  Then 


f(x,d)>0 


(17) 
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holds  for  any  accumulation  point  d  of  the  sequence  {dk  =  sfc/||sfc||,  jfc  £  IN}. 

Proof.  Let  tk  =  ||sjt||  and  dk  =  Sfc /||sfc 1 1 •  Let  d  be  any  accumulation  point  of  {dk,k  £  EM}.  From  (16) 
and  (12)  we  obtain 


f(x  +  tkdk)  -  f(x) 
tk 


>  c0f'(x;  dt) 


which  implies  (17),  since  /  is  Lipschitz  near  x  and  0  <  c0  <  1.  □ 

The  standard  definition  of  a  stationary  point  x»  of  a  real-valued  function  /  in  unconstrained  nonsmooth 
optimization  is  that  0  £  3/(x*).  In  our  case,  the  function  /  is  regular,  therefore  this  characterization  is 
equivalent  to 


/'(x*;s)  >  0 


(18) 


for  all  s  in  IR"  (see  (9)).  The  following  proposition  relates  the  definition  of  st.ationarity  to  the  set  of  minimizers 
of  the  local  model. 

Proposition  2.1.  Let  f  =  ||T||  where  F  :  IR”  — ►  IR”  is  continuously  differentiable.  Then  x *  £  IR”  is  a 
stationary  point  of  f  if  and  only  if  for  all  s  £  IR” 


(19) 


or  equivalently  mx,(0)  <  mx,(s)  for  all  s  £  IR"  where  mx  is  given  in  (10). 

Proof.  Suppose  that  x,  is  a  stationary  point  of  /,  i.e.  /'(x„;s)  >  0  for  all  s  £  IR”.  By  Lemma  2.2,  we 
have 

f  ,  ^)  *'■  (s)  rnX'  (0) 

for  all  s  £  IR”.  This,  together  with  (18),  implies  (19).  Now  assume  that  (19)  holds,  and  let  s  be  any  point 
of  IR”.  Then,  we  have 

mx,(ts )  —  mx  (0) 

— >  oi  V  t  >  0. 

This,  together  with  Lemma  2.1  implies  that  >  0.  □ 

From  Proposition  2.1  it  is  obvious  that  any  solution  of  the  nonlinear  system  (1)  is  a  stationary  point 
°f  /  =  1 1 7'  ||.  In  the  following  theorem,  we  establish  a  sufficient  condition  for  a  stationary  point  x*  to  be  a 
solution  of  to  Problem  (1). 

Theorem  2.1.  Let  x *  be  a  stationary  point  of  f  =  ||f||.  Then  either  F(xt)  -  0,  or  the  linear  system 

F(xt)  +  F  (x«)s  =  0  (20) 

does  not  have  a  solution. 

Proof.  Assume  that  F(x+)  ^  0  and  consider  a  solution  s*  of  the  linear  system  ,  i.e 

F(xt)  +  F  (x*)s*  =  0. 


From  Lemma  2.2,  we  conclude  that 

/(**;«.)  =  -||F(a:*)||. 

This  contradicts  the  hypothesis  that  x*  is  a  stationary  point  of  /.□ 

3.  The  General  Trust-Region  Algorithm 
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In  this  section  we  define  our  general  trust-region  algorithm  for  approximating  a  solution  of  the  nondif- 
ferentiable  optimization  problem 

minimize^ ir"  f(x)  =  ||F(x)|| 

where  F  :  IRn  — *  IRn  is  continuously  differentiable. 

Let  Ci,  i  =  0, . . . ,  5  be  positive  scalars  such  that 

0<co<l  0<c1<c2<l<c3  0  <  c4  <  c5  <  1, 

Also  let  8min  be  any  arbitrary  small  positive  scalar,  let.  x0  be  any  point  in  IRn,  let  80  be  any  positive  scalar, 
and  let  ||  ||a  and  ||  ||j  be  any  two  norms  on  IRn.  Consider  a  real  valued  upper  semi-continuous  function  y 
defined  on 

V  =  {(x,s)  G  IRn  x  IR"  |  mx(s)  —  /(x)  <  0}  (21a) 

and  satisfying 

max  {  f'(x;s)  ,  —  [mx(s)  -/(*)]  }  <  y (x,s)  <0.  (21b) 

Co 

Suppose  that  xj,  and  8k  are  the  iterate  and  the  trust-region  radius  determined  by  the  algorithm  at  the 
kth  iteration.  The  algorithm  determines  xk+i  and  8k+ i  in  the  following  manner: 

STEP  1.  Set  fXk  =  8k. 

STEP  2.  Obtain  sk  as  a  solution  of  the  model  trust-region  subproblem  (6) 

STEP  3.  If  f(xk  +  Sk)  <  f(xk)  +  c0 y(xk,Sk)  set  xk+i  =  xk  +  sk,  and  go  to  STEP  4, 

Else  choose  fik  such  that  c4||s*||j  <  Hk  <  c5||sjt||6  and  go  to  STEP  2; 

STEP,  4.  If  f(xk  +  sk)  <  f(xk)  +  c2[m*(sjfc)  -  f(xk)} 

choose  8k+i  so  that  ||s*||i,  <  6k+ 1  <  max(//*; ,  c3||s*||6), 

Else  if  f(xk  +  sk)  >  f(xk)  +  c2[mk(sk)  -  /(**)] 

choose  8k  + 1  such  that  c4||s* ||6  <  8kJrX  <  c5||sjb||t; 

Else  choose  8k+1  so  that  c4||sfc ||6  <  8k+\  <  max(/ijt ,  c3||s*;||6); 

STEP  5.  Set  8k+i  =  ma x(8k+i,8min). 

Definition  3.1.  The  scalar  fjk  for  which  the  test  in  STEP  3  of  the  algorithm  is  satisfied  will  be  said  to 
determine  an  acceptable  step  with  respect  to  (xk  ,8k).  (observe  that  it  is  not  an  arbitrary  /ik  in  (0 ,  A*]) 
Remark.  If  xk  is  not  a  stationary  point  of  f  —  ||F||  and  8k  >  0,  we  obtain  from  Lemma  6.1  that 
(xk,sk)  G  T>  defined  in  (21a)  with  mXk  =  mk.  Therefore  inequality  (21b)  is  consistent  and  STEP  3  is  well 
defined. 

Possible  choices  for  the  function  y  used  in  STEP  3  of  the  algorithm  are 

7(x,s)  =  -T  [mj,(s)  - /(x)  ]  (22a) 

for  ci  <  Co,  or 

y(x,s)  =  f'(x-s).  (22b) 

for  Co  <<  ci  and  sufficiently  small  s.  In  the  choice  (22b),  y  is  upper  semi-continuous  (see  [2,  pp. 25-26].  In 
the  choice  (22a)  y  is  obviously  continuous.  Because  of  (25b)  and  Lemmas  2.3  and  2.4,  our  theory  does  not 
allow  the  Duff,  Nocedal  and  Reid  choice  [7],  i.e. 

7(x,s)  = -||F,(x)s||i  (22c) 
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(see  (4))  unless,  by  Lemma  2.2,  we  have  the  extreme  case 

f(xk',sk)  =  -||F'(a;i)sjfe||1  Mk  £  IN. 

Near  a  solution  we  expect  sk  to  be  the  Newton  step,  and  in  this  case  the  choices  (22a),  (22b),  and  (22c)  are 
equivalent  (see  Lemma  2.2).  It  follows  that  the  asymptotic  properties  of  the  respective  algorithms  would  be 
the  same. 

4.  A  Fundamental  Property  of  Trust-Region  Algorithms 

In  this  section  we  will  demonstrate  that  trust-region  algorithms  enjoy  the  satisfying  property  that  as 
the  radius  of  the  trust  region  approaches  zero  the  solutions  of  the  model  trust-region  problem  approach 
directions  of  steepest  descent  of  /.  For  the  case  where  this  norm  is  ||  1 1 2 ,  this  result  is  well-known  and  is  often 
used  as  a  theoretical  tool.  This  result  will  play  an  important  role  in  the  convergence  analysis  developed  in 
a  later  section. 

Theorem  4.1.  Let  oj  :  IRn  — >  IR  be  locally  Lipschitz  and  let  x  £  IRn  be  such  that  the  one-sided  directional 
derivative  lo'(x;s)  exists  for  all  s  £  IR”.  Also  let  {Sk,k  £  IN)  be  a  sequence  of  real  numbers  decreasing  to  0. 
Consider  a  sequence  {s*,,  k  £  IN),  where  sk  is  a  solution  of  the  problem 

minimize  to(x  +  s) 
subject,  to  ||s||  <  6k. 

U  sk  ^  0  for  all  k  £  IN,  then  any  accumulation  point  d*  of  {dk  -  s*/||sjb || ,  k  £  IN)  is  a  steepest  descent 
direction  for  ui  at  x  with  respect  to  the  norm  ||  ||. 

Proof.  Let  s  be  any  vector  of  norm  one,  and  let  d*  be  any  accumulation  point  of  {dk,k  £  IN).  By 
choosing  a  subsequence,  if  needed,  we  can  assume  without  loss  of  generality  that  {dk,k  £  IN)  converges  to 
d» .  We  have 

p-jj  M*  +  sk)  ~  w(*)]  <  M*  +  ||s*||s)  -  w(x)]  .  (23) 

By  using  the  quantities  dk  =  sfc/||sfc[|  and  tk  =  ||sfc||  in  (23)  we  obtain 

u>(x  +  tkd*)  -  w(x)  ui(x  +  tkdk)  -  ui(x  +  tkd »)  ^  ui(x  +  tks )  -  u>(x) 
tk  tk  tk 

which  implies,  because  ui  is  locally  Lipschitz,  that 

u)'(x ;  d»)  <  ui'(x]  s )  . 

This  inequality  means  that  d*  is  a  steepest  descent  direction  for  ui  at  x  with  respect  to  the  norm  ||  ||  .  □ 
Remark.  In  our  application  the  function  w  can  represent  either  /  or  rnx  (see  Lemma  2.1). 

5.  Zangwill’s  Global  Convergence  Theory 

In  numerical  optimization,  most  algorithms  are  iterative.  Namely,  given  a  point  z0  £  IR",  a  sequence 
of  points  {zk,k  £  IN)  is  generated  recursively  according  to  the  defining  relation  zk+1  £  A(zk)  where  A  is  a 
point-to-set  map  and  any  point  in  the  set  A(zk )  is  an  acceptable  successor  point  of  zk. 

Notice  that  the  model  does  not  specify  the  type  of  problem  we  are  solving.  We  refer  to  the  set  of  solutions 
as  the  solution  set  P .  For  a  specific  application,  A  and  P  must  be  defined. 

Our  motivation  for  using  point-to-set  maps  to  model  our  algorithm  stems  from  the  following  theorem  due 
to  Zangwill  [23],  We  present  the  theorem  as  stated  in  Huard  [14].  We  first  need  the  following  definitions. 
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Definition  5.1.  The  point-to-set  map  A  is  said  to  be  upper-continuous  at  x  £  IRn  if  {xk)k  £  IN} 
converges  to  x  and  { y k  £  A(xk),  k  £  IN}  converges  to  y  implies  that  y  £  A(x). 

Definition  5.2.  The  point-to-set  map  A  is  said  to  be  lower-continuous  at  a:  £  IRn  if  for  any  sequence 
{xk,  k  £  IN}  converging  to  x  and  for  any  y  £  A(x),  there  exist  a  sequence  {yk,  k  £  IN}  converging  to  y  and 
an  integer  k  such  that  yk  £  A{xk)  for  k  >  k. 

Definition  5.3.  The  point-to-set  map  A  is  said  to  be  continuous  at  a;  £  IRn  if  it  is  both  upper-continuous 
and  lower-continuous  at  x. 

Theorem  5.1.  Consider  a  compact  set  E  C  IR”,  a  solution  set  P  C  E,  a  point-to-set-map  A  :  E  — >  2E , 
and  a  continuous  function  h  :  E  — *•  IR.  Assume  that  for  any  z  £  E  and  z  £  P  we  have 

(i)  A(z)  +  0 

(ii)  h(z')  <  h(z)  for  any  z'  £  A(z). 

(iii)  A  is  upper- continuous  at  z. 

Assume  further  that  a  sequence  {zk,k  £  IN}  has  been  obtained  by  the  folloiving  recursion  relation:  let  zo 
be  any  point  in  E,  if  zk  (£  P  then  zk+ 1  €  A(zk),  otherwise  zk+\  =  zk .  Then  any  accumulation  point  z *  of 
{zk,k  £  IN}  is  contained  in  P. 

Proof,  lhis  theorem  is  Convergence  Theorem  A  in  [23]  or  is  a  consequence  of  Corollary  3  and  Remark 
6  in  [14],  □ 

More  details  regarding  point-to-set  maps  can  be  found  in  Berge  [1],  Denel  [4],  Hogan  [11],  Huard  [12], 
Huard  [13]  and  Huard  [14],  and  Meyer  [16]. 

6.  Global  Convergence  of  the  General  Trust-Region  Algorithm 

We  will  establish  global  convergence  of  the  General  Trust-Region  Algorithm  described  in  Section  3  by 
modeling  it  by  a  point-to-set  map  A  which  satisfies  the  hypotheses  of  Zangwill’s  theorem  (Theorem  5.1). 

If  we  considered  only  c0  =  Ci  and  7  given  by  (22a)  (Powell’s  choice  in  [19],  then  we  could  obtain  global 
convergence  of  our  arbitrary  norm  trust-region  algorithm  from  the  global  convergence  theory  developed  by 
Powell  [19].  However,  even  for  this  special  case  the  results  established  in  Sections  4,  7,  and  8  would  be  new 
and  important  contributions. 

In  order  to  apply  Theorem  5.1,  we  need  the  following  lemma  whose  proof  will  be  given  later. 

Lemma  6.1.  Let  x0  be  any  point  £  IR".  If  the  subset  o/IR"  X0  =  {x  £  IR"  |  f(x)  <  f{x0)}  is  bounded, 
then  there  exists  a  positive  scalar  6max  such  that  the  trust-region  radius  6k  satisfies 

0  <  4  <  6max  Vfc  £  IN.  (24) 

We  will  define  the  compact  set  of  Theorem  5.1  as 

^  —  Ao  X  ,  ^mai],  (25) 

the  solution  set  P  will  consist  of  the  points  ( x ,  6)  £  E  such  that  x  is  a  stationary  point  of  /,  and  the  merit 
function  h  will  be 

h(x,S)  =  f(x).  (26) 

Finally,  the  point-to-set  map  A  will  be  defined  as  follows: 
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Definition  of  the  point-to-set  A.  For  z  £  P,  we  set  A(z)  =  {z},  and  for  z  =  (x,6)  £  P  we  say  that 
z'  =  («',  8')  £  A(z)  if  the  scalar  pi  that  determines,  with  respect  to  ( x ,  6 ),  an  acceptable  step  s  and  x'  =  x  +  s 
satisfy  the  following  five  conditions 

(a)  s  =  s(pi)  £  argmin {m^s)  |  ||s||(,  <  fj,}, 

(b)  f(x')  <  f(x)  +  c0j(x,s)- 

(c)  if  f(x')  <  f(x)  +  c2[mx(s)  -  /(*)], 

then  \\s\\b  <  6'  <  max(6,  c3||*'||j); 

else  if  f(x')  >  f(x)  +  c2[mx(s )  -  /(x)], 

then  c4 1 1  s  1 1  *  <  S'  <  c5 1 1  s  1 1 ; 
else  c4||s||(,  <  S'  <  max(6,  c3||s||fe); 

and 

(d)  S'  =  max(6',6m;n),  (27) 

We  now  state  our  global  convergence  theorem. 

Theorem  6.1.  Consider  a  continuously  differentiable  function  F  :  IRn  — ►  IRn.  Let  ||  ||0  and  ||  ||j  be 
arbitrary  norms  on  IRn,  let  xq  be  an  arbitrary  point  in  IR",  let  /(x)  =  ||F(x)||a,  and  let  T>  be  defined  m 
(21a).  Assume  that  the  level  set  .Yo  =  {x  €  IRn  ||  f(x)  <  f(x o)}  is  bounded  and  that  the  function  7  :  V  — ►  IR 
used  in  Step  3  of  the  General  Trust  Region  Algorithm  is  upper  semi-continuous  and  satisfies  (21b).  Then  any 
accumulation  point  of  the  sequence  {xk,k  £  IN}  generated  by  the  General  Trust-Region  Algorithm  presented 
in  Section  3  using  no  as  initial  point  is  a  stationary  point  of  f. 

The  proof  of  the  theorem  will  require  the  use  of  Lemma  6.1  and  the  following  lemmas  whose  proofs  will 
be  given  shortly.  We  will  use  ||  ||  for  either  norm  ||  ||0  or  ||  ||(,  since  their  use  will  be  clear  from  the  context. 

Lemma  6.2.  Consider  (x,S)  where  6  >  0  and  x  is  not  a  stationary  point  of  f.  Then  the  General 
Trust-region  Algorithm  cannot  loop  infinitely  often  between  STEP  3  and  STEP  2. 

Lemma  6.3.  The  point-to-set  map  A  is  upper- continuous  at  any  ( x,8 )  £  E  —  P. 

Proof  of  Theorem  6.1.  It  is  sufficient  to  prove  that  the  conditions  of  Theorem  5.1  hold.  Because  of 
Lemma  6.1  the  subset  E  of  IRn  x  IR  defined  by  (25)  is  compact.  We  also  have  that  the  function  h  defined 
on  E  by  (26)  is  continuous.  Let  us  show  that  conditions  (i),  (ii),  and  (iii)  of  Theorem  5.1  hold.  First,  let 
z  =  (x,6)  0  P.  Then,  by  (25)  it  is  obvious  that  6  >  0,  and  by  Lemma  6.2,  there  exists  pi  £  (0,6]  and 
s  £  argminjm^s)  |  ||s||  <  pi}  such  that  x'  =  x  +  s  satisfies 

fix')  <  f(x)  +  c0t(x,  s). 

The  existence  of  8'  such  that  z'  =  (x',81)  £  A{z)  is  obvious.  Therefore,  property  (i)  of  Zangwill’s  Theorem 
5.1  holds.  Secondly,  if  (x',81)  £  A{x,8)  it  is  straightforward  from  (26),  (27c),  and  the  fact  that  7  satisfies 
(21b),  that  h(x',8')  <  h(x,8).  This  is  condition  (ii)  of  Theorem  5.1.  The  third  condition  (iii)  follows  from 
Lemma  6.3.  Therefore,  our  theorem  is  a  consequence  of  Zangwill’s  Theorem  5.1  and  Lemmas  6.1,  6.2,  and 
6.3.  □ 

We  now  return  to  the  proofs  of  Lemmas  6.1,  6.2,  and  6.3 

Proof  of  Lemma  6.1.  The  sequence  {sj,  =  xt+i  —  xt,k  £  IN}  is  bounded,  say  by  M.  Since  8k+\  < 
max(pik,c2\\sk\\),  pik  <  8k,  and  ||sfc||  <  M,  we  obtain 

8k+i  <  max(8k,c2M).  (28) 
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Assume  that  there  exists  a  subsequence  {5k,  k  E  N'  C  IN}  diverging  to  +oo.  Let  k »  £  N'  be  the  smallest 
integer  such  that  6k,  >  c^M .  Then  we  obtain  6j  <  6k,  V  j  >  k*,j  E  IN.  This  contradicts  the  divergence 
hypothesis  of  {6*,,  k  E  N1  C  IN}.  Consequently  there  exists  a  positive  scalar  5max  such  that  (24)  holds. □ 
Proof  of  Lemma  6.2.  We  prove  the  contrapositive.  Suppose  that  the  algorithm  loops  indefinitely.  Let 
{xj,  j  E  IN}  be  the  sequence  generated  by  letting  Xj  =  x  +  Sj  where  sj  is  a  solution  of  the  following  model 
trust-region  problem 

minimize  mx(s)  =  ||F(x)  +  F'(x)s|| 
subject  to  ||s||  <  Hj  ■ 

Observe  that  ||sj  +  i||  <  [ij+i  <  C4 1 1 sy  1 1  for  all  j  E  IN  and  that  0  <  c4  <  1,  so  the  sequence  {||sj||,i  G  IN}  is 
decreasing  to  s  —  0.  Under  our  hypothesis  the  test  in  Step  3  fails  for  all  j  E  IN,  thus,  since  y(x,  Sj )  >  /'(x;  Sj ), 
we  have 

/(x  +  sj)  >/(x)  +  cof{x-,Sj).  (29) 


Therefore,  from  Lemma  2.4  we  obtain 


/'(x;d»)  >  0 


where  d,  is  any  accumulation  point  of  {dj,j  E  IN}.  But  from  Theorem  4.1  and  Lemma  2.1  we  obtain  that 
d,  is  a  steepest  descent  direction  for  /  at  x.  Consequently,  for  all  d  E  IRn  with  norm  one,  we  have 


f'(x;d)  >  0, 


which  implies  that  x  is  a  stationary  point  of  /.  □ 

To  prove  Lemma  6.3  we  need  the  following  lemma. 

Lemma  6.4.  Suppose  that  the  sequence  {(xk,6k)  P,k  E  IN}  converges  to  some  (x,<5)  ^  P.  If  Hk  is  a 
scalar  that  determines  an  acceptable  step  with  respect  to  (x*,,  6k),  then  any  accumulation  point  of{/ik,k  E  IN}, 
say  fi,  satisfies  the  inequality 

H  >  0  .  (30) 

Proof  of  Lemma  6.3.  Let  {(x*,,  6k),  k  E  IN}  be  a  sequence  that  converges  to  (x„,  ($»)  and  let  {(xj.,  6’k)  E 
A(xk,  6k),  k  E  IN}  be  a  sequence  that  converges  to  some  (x'„,<5().  We  want  to  establish  that  (x*,<5})  E 
A(x,,6*).  By  the  definition  of  A,  {x'k,6'k)  6  A(xk  ,  5  k)  implies  that  there  exists  a  positive  scalar  determining 
an  acceptable  step  Sk  such  for  x'k  =  Sjt  +  x k,  the  following  conditions  hold. 

0  <  Hk  <  6k  (31a) 

Sk  G  argmin{mfc(s)  |  ||s||  < /i*},  (31b) 

f{x'k)  <  f{xk)  +  c0y{xk,sk).  (31c) 

The  sequence  { Sk,k  6  IN}  converges  to  s*  such  that  =  x*  +  s«.  Let  fi,  be  any  accumulation  point  of  the 
sequence  {qik,k  E  IN}.  Since  (x*,(5*)  ^  S,  i.e.  x,  is  not  a  stationary  point  of  /,  by  Lemma  6.4  and  (31a)  we 
obtain  that 

0</i,  <6,.  (32) 


Now  we  establish  that 

s*  G  argmin{mr(.s)  |  ||s||  <  /r*}. 


We  can  rewrite  (31b)  as 


sk  E  argmin{<£(s,  xk,Hk)  |  s  G  T(xh,Hk)}  ■ 


(33) 
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where  <j>(s,  x,/i)  -  mx(s)  and  T(x,ji)  =  {s£  IRn  |  ||s||  <  //},  The  point-to-set  map  T  is  the  composition  of 
the  projection  n(j/,  -r)  =  r  which  is  continuous  and  the  point-to-set  map  B(r)  —  {s  G  IRn  |  r  >  ||s||}  which  is 
continuous  by  Theorem  A. 10  of  [13].  Consequently,  by  Theorem  A. 6  of  [13],  the  point-to-set  map  T  =  B  o  II 
is  continuous.  Therefore,  since  the  function  <6  is  also  continuous,  we  obtain  from  Theorem  A. 15  of  [13]  that 
the  point-to-set  map 

:  (x,n)  — *  argmin  \  s  G  T(x,fi)} 

is  upper-continuous.  Because  {(xj,,  /ij,),  k  G  N'  C  IN}  converges  to  (x*,/i„),  {s*,  G  rp(xk,fik),k  G  N'} 
converges  to  s* ,  the  upper-continuity  of  implies  (33).  The  upper  semi-continuity  of  the  real-valued  function 
7  implies  that  the  function  g  defined  by 

g(x,  s)  =  f(x  +  s)  -  f{x)  -  cQ~f(x,  s) 

is  lower  semi-continuous.  This  implies,  because  of  (31c),  that 

f(x*  +  s*)  f  (x* )  -p  co7(a;» ,  s* ) .  (34) 

Properties  (32),  (33)  and  (34)  establish  the  first  three  properties  needed  to  conclude  that  (x*,  <5')  belongs  to 
A(x*,<5*),  i.e.  (27a),  (27b),  and  (27c).  Let  us  establish  the  fourth  property  (27d).  Suppose  that 

/(x*  +  s„)  -  /(x»)  -  c2  [mi, (s* )  -  mx,(0)}  <  0.  (35a) 

The  sequence  {(xk,6 k),k  €  IN}  converges  to  (x*,<5*),  so  for  all  large  k  6  IN  we  have 

f{xk  +  sk)  -  f(xk)  -  c2  {mk(sk)  -  mk( 0)}  <  0, 


which  gives 

and  consequently,  we  obtain 

Now  suppose  that 


INI  <  <  c3  ma.x(6k ,  c3||sj,  ||), 

||s*||  <K<  c3max(5»,c3||s*||). 


/(x*  +  s*)  -  /(x*)  -  c2  {rax.(s*)  -  mx,(0)}  >  0. 
We  establish  in  the  same  way  as  (35b)  that. 

c4||s,||  <6i<  c5||s„||. 

Finally,  if  neither  (35a)  nor  (36a)  holds,  then  necessarily  we  have 

/(x*  +  s.)  -  /(x*)  -  c2  [mI((s,)  -  mx,(0)}  =  0, 

and  it,  is  obvious  that 

c4||s*||  <  K  max(<5*,c3||s«||) 

holds.  Properties  (35),  (36),  and  (37)  establish  (27d).  The  fifth  property 

K  =  rnax(<5',<5mm) 


(35b) 

(36a) 

(36b) 

(37a) 

(37b) 
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is  obvious.  And  we  conclude  that,  (*(,,£')  £  A(z*,i5*),  and  the  map  A  is  upper-continuous.  □ 

Now  we  prove  Lemma  6.4. 

Proof  of  Lemma  6.4.  Let  y  be  any  accumulation  point  of  {/j,k,  k  £  IN}.  Without  loss  of  generality,  we 
can  assume  that  {y,k,  k  £  IN}  converges  to  fi.  It  follows  that  yk  <  8k.  We  consider  two  cases: 

Case  i).  We  suppose  that  there  exists  a  subsequence  of  {y.k,  k  £  N'  C  IN}  such  that  yk  =  8k  in  which  case 
we  have  /i  =  8.  Consequently  we  obtain  (30)  because  8  >  8min. 

Case  ii).  Suppose  that  yk  <  8k  for  all  sufficiently  large  A:  £  IN.  Therefore  8k  never  gives  an  acceptable  step. 
Let  Sk  be  the  last  non-acceptable  step  obtained  by  decreasing  8k-  Since  8k  >  0  and  Xk  is  not  a  stationary 
point  of  /  we  have,  by  Lemma  6.2,  that  sk  ^  0  and  yk  >  0.  Also  we  have  for  large  A;  £  IN 

Hk  =  c4||s*:||.  (38) 


Assume  that  p  =  0.  From  inequality  (38)  we  obtain  that  {s*,  |  k  £  IN}  converges  to  zero. 

Let  s*k  £  argmin{mx(s)  |  ||s||  <  yk},  and  let  d*  be  any  accumulation  point  of  {d*k  =  s£/||sjj!||,  A:  £  IN}. 
Since  {yk  >  0,  k  £  IN}  converges  to  zero,  we  obtain  from  Theorem  4.1  and  Lemma  2.1  that  d*  is  a  steepest 
descent  direction  of  /  at  x.  Consider  a  subsequence  {d*k,  k  £  N  C  IN}  that  converges  to  d* ,  and  let  ak  be  a 
positive  scalar  such  that  ||afcS^||  =  ||sfc||.  Then  we  have  for  all  sufficiently  large  k  £  N 


mk(sk)  ~  f(xk)  rnk(aksl)  -  /( xk) 

INI  _  IKstll 


Let  us  set  tk  =  ||sfc||  =  ||,  y*k  =  aks*k  and 


dk  — 


Sk 

IMP 


(39) 


Therefore  (39)  becomes 

mk(tkdk)  -  f{xk)  <  m.k{tkd*k )  -  f(xk ) 
tk  ~  tk 

which  implies,  since  ||djt||  =  ||d^||  =  1, 

f(xk  +  tkdk)  -  f(xk)  <  f(xk  +  tkd*k)  -  f(xk)  o(tk ) 
tk  tk  tk 


Therefore  we  obtain 


lim  sup 

k  £  N  —>■  -\-oo 


H”  tkd k)  j 

h 


<  lim  sup 


f(y  +  td*)  -  f(y) 
t 


y-.x 

<|0 


or,  since  /  is  regular, 

1-  ftyXk-\-tkdk)  —  f{xk)  ft  t 

hmsup  -  <  J  (x;d  ). 

kEN—t+oo  tk 

Moreover,  since  sk  is  not  acceptable,  we  have 


(40a) 


}{xk  +  sk)  >  f{xk)  +  c0y(xk,sk) 

which  implies,  together  with  (21b),  that  for  sufficiently  large  A:  £  TV 

f(xk  +tkdk)  -  f(xk)  mk(tkdk)  —  f(xk) 
h  >  Cl  tk 
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But  since  F  is  continuously  differentiable  (see  (23)),  this  implies 

f(xk  +tkdk)  -  f(xk)  c  f{xk  +  tkdk)  -  f{xk)  o{tk) 
tk  tk  tk 


and  because  0  <  c\  <  1 

,.  f(xk+tkdk)-f(xk) 

limsup  -  >  0. 

fceiv— *-+oo  G 

(40b) 

From  (40a)  and  (40b),  we  obtain 

>  0, 

(41a) 

and  since  d*  is  a  steepest  descent  direction  of  /  at  x,  this  implies  that 

f'(x;  s)  >  0. 

(41b) 

for  all  s  £  IRn,  which  contradicts  the  hypothesis  that  x  is  not  a  stationary  point  of  /.  Therefore  any 
accumulation  point  of  the  sequence  {fik  |  fc  £  IN},  say  pi,  satisfies  (30).  □ 

7.  Convergence  to  a  Solution  of  F(x)  =  0 

In  this  section  we  establish  a  mild  condition  which  guarantees  that  any  accumulation  point  of  the  sequence 
{**,*£  IN}  generated  by  the  General  Trust- Region  Algorithm  is  actually  a  solution  of  the  nonlinear  system 
F(x)  =  0.  We  then  demonstrate  that  if  an  accumulation  point  x+  is  such  that  F'(x»)  is  nonsingular,  then 
the  iteration  sequence  actually  converges  to  xt. 

Theorem  7.1.  Let  S  be  the  set  of  accumulation  points  of  the  sequence  {xk,k  £  IN}  generated  by  the 
General  Trust-Region  Algorithm.  Under  the  assumptions  of  Theorem  6.1,  one  of  the  following  holds: 

(i)  all  accumulation  points  are  solutions  of  the  nonlinear  system,  i.e 


F (*„)  =  0  Va :„  £  S’ 

(42a) 

(ii)  for  all  x *  £  S,  the  linear  system 

F(x„)  +  F'(xt)s  =  0 

(42b) 

does  not  have  a  solution. 

To  prove  this  theorem  we  will  need  the  following  lemma. 

Lemma  7.1.  Let  h  :  IRn  — »  1R  be  continuous.  Also  let  {zk,k  £  IN}  be  a  bounded  sequence  such  that  the 
sequence  {h(zk),k  £  IN}  is  decreasing.  Then  the  function  h  is  constant  on  the  set  of  accumulation  points  of 
{zk,  k  £  IN}. 

Proof  .  Let  z *  and  z'  be  two  accumulation  points  of  the  sequence  {zk,k  £  IN}.  Then,  there  exist  two 
subsequences  {zk,k  £  N}  and  {zk,k  £  N'}  that  converge  respectively  to  z*  and  z'.  We  have  that  for  every 
j  in  N ,  there  exists  kj  in  N'  such  that 

h{zkj)  <  h(zj),  kj  >  j.  (33) 

From  the  continuity  of  h  and  (33)  we  obtain 

KK)  <  M~*)-  (44) 

Since  the  roles  of  z'  and  z*  in  establishing  (44)  are  symmetric  we  conclude  that 

h{z*)  =  h(z'), 
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which  establishes  the  lemma.  □ 

Proof  of  Theorem  7.1.  The  function  /  =  ||F||a  is  continuous,  the  sequence  {xk,k  £  IN)  is  bounded 
and  the  sequence  {f(xk),k  £  IN}  is  decreasing.  Therefore,  by  Lemma  7.1,  /  =  ||F||a  is  constant  on  the  set 
S  of  accumulation  points  of  {a;* ,  k  £  IN}.  By  Theorem  6.1,  any  a;*  £  S  is  a  stationary  point  of  /.  Therefore, 
by  Theorem  2.1,  either  any  u*  £  S  solves  the  nonlinear  system  (1)  or  no  linear  system  (42b)  has  a  solution. 
□ 

Corollary  7.1.  Under  the  assumptions  of  Theorem  6.1,  if  the  sequence  {xk,k  £  IN}  generated  by  the 
General  Trust-Region  Algorithm,  has  an  accumulation  point,  say  x *,  such  that  F'(x *)  is  nonsingular,  then 
F(x *)  =  0  and  {xk,k  £  IN}  converges  to  x *. 

Proof  .  By  Theorem  6.1,  the  accumulation  point  x »  is  a  stationary  point  of  /  =  ||F||.  Since  F'(x *)  is 
nonsingular,  the  linear  system  (42b)  has  a  solution.  Therefore,  by  Theorem  7.1,  we  obtain 

F(x,)  =  0. 

Now  the  convergence  of  the  sequence  {xk,  k  £}  to  x *  follows  from  Theorem  3.3  of  Eisenstat  and  Walker  [8] . D 
Remark.  Homer  Walker  pointed  out  to  the  authors  that  the  new  Eisenstat  and  Walker  theory  [8]  could 
be  used  to  actually  demonstrate  convergence  of  the  sequence  {xk,k  £  IN}  as  stated  in  Corollary  7.1. 

8.  Q-quadratic  Convergence  of  the  General  Tr  ust-Region  Algorithm 

Corollary  7.1  shows  that  the  algorithm,  under  mild  assumptions,  generates  a  sequence  {xk,  k  £  IN}  which 
converges  to  a  nonsingular  solution.  Under  the  same  assumptions,  we  prove  that,  for  large  k,  the  General 
Trust-Region  Algorithm  reduces  to  Newton’s  method  and  consequently  the  convergence  of  {xk,k  £  IN}  to 
£*  is  ^-quadratic. 

Theorem  8.1.  Assume  that  the  hypotheses  of  Theorem  6.1  hold.  Also  assume  that  the  sequence  {xk,k  £ 
IN}  generated  by  the  General  Trust-Region  Algorithm  has  an  accumulation  point,  say  x»,  such  that  F'{x *) 
is  nonsingular  and  F'  is  Lipschitz  near  x».  Then  for  sufficiently  large  k,  xk  is  the  Newton  iterate  for  the 
nonlinear  equation  F(x)  =  0,  and  consequently  the  convergence  of  {xk,k  £  IN}  to  x*  is  q-quadratic. 

Proof.  By  Corollary  7.1,  the  iteration  sequence  converges  to  x*.  To  prove  that  the  algorithm,  for  large 
k ,  is  equivalent  to  Newton’s  method,  first  we  establish  that  the  test 

f{xk+ i)  <  f(xk )  +  C2  [mjfc(sjfc)  -  ™fc(0)]  (45) 

is  satisfied  for  large  k.  Since  F  is  continuously  differentiable  we  have 

f{xk  +  sk )  =  ||F(x*,)  +  F'(xk)sk  +  o(||s*||)||, 

and  therefore 

f(xk )  -  f(xk  +  sk )  >  f(xk)  -  [m*,(sit)  +  IKIMDH]  ■ 

Because  f(xk)  —  mk(sk)  >  0  this  implies  that 

f{xk)-f{xk+sk)  ||0(IM1)|,  INI 

f(xk)-mk(sk)  ~  |M|  f(xk)  -  mk(sk)' 

Let  us  show  that  the  ratio 

f{xk)  -  mk(sk) 

11**11 
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is  bounded  away  from  zero.  Since  {xk,  k  E  IN}  converges  to  z*,  F'(xt)  is  nonsingular  and  F  is  continuously 
differentiable,  there  exists  fc*  E  IN  and  0  <  A*  such  that  F'(xk)  is  nonsingular  for  all  k  >  k *  and 

\\F'(xk)d\\  >  A,||d||  VdGlRn  and  Vfc  >  ik..  (47) 


Let  us  set 


where  sk  is  the  Newton  step,  i.e. 


„  1M 

k  II  oW  II 


N 

Vk  —  otksk 


(48) 


F'(xk)a%  +  F(xk)  =  0.  (49) 

The  definition  of  sk  and  the  nonsingularity  of  F  ( xk )  imply  that  either  sk  =  s ^  or  |N||  <  ||s^.  therefore 
the  inequality  |N||  <  ||sj^||  holds  for  all  sufficiently  large  k,  which  shows  that  ak  E  (0,1].  We  have,  since 

IMI  =  IHI> 

f(xk)  -  mk(yk)  <  f(xk)  -  mk(sk).  (50) 

From  (48),  (49),  (50)  and  |N||  =  \\yk\\  we  obtain 

\\F'(xk)sk  II  f{xk)-mk(sk) 

Kll  -  INI 

Using  inequality  (47)  we  get 


0  <  A*  < 


f(xk)  -  mk(sk) 

INI 


(51) 


for  all  k  >  k».  Property  (51)  and  inequality  (46)  imply  that  for  k  >  k*  we  have 

1  |i°(IMI 
aJ1  in  || 


f(xk)  -  f{xk  +  Sk)  >  1  _  1  ,|0(|N||)| 


f(xk)  -  mk{s) 

On  the  other  hand,  there  exists  an  integer,  say  k+,  such  that 

,  i  ii^INII),,.  . 
aJ1  inh  11  -  2 

for  all  k  >  k».  Consequently,  inequality  (45)  holds  for  k  >  kt.  Furthermore  the  trust-region  radius  is  updated 
according  to  the  rule 

INII  <  4+i  <  max(/tj.,  c3||sfc||).  (52) 

Also,  since  0  <  «+  <  C2,  we  obtain  from  (21b)  that 


coj(xk  ,sk)>c2  [m, t  (s* )  -  mk  (0)] 


which,  together  with  (45),  implies  that  Sk  determines  an  acceptable  step  with  respect  to  (xk>Sk),  i.e.  fik  —  8k 
for  all  k  >  kt.  Therefore,  for  k  >  the  trust-region  radius  8k  is  updated  according  to  the  rule 

INII  <  4+1  <  max(6jt,  c3||s* ||) -  (53) 

Suppose  that  there  exists  an  integer  k\  >  kt  such  that  sk  ^  s ^  for  all  k  >  k\.  This  implies  that  ||s* ||  =  4 
for  all  k  >  k\  and  by  (52)  ||s*||  <  ||st+i||  for  all  k  >  k\.  This  contradicts  the  hypothesis  that  {sj,  = 
xk+i  -  xk,  k  E  IN}  converges  to  zero.  Therefore  for  all  j  >  k*  there  exists  an  integer  kj  >  j  such  that 


sk, 


(54) 
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Let  A f(x»)  be  a  sufficiently  small  neighborhood  of  x *  where  the  local  ^-quadratic  convergence  occurs,  (see 
[6]).  Let  j  be  the  smallest  integer  such  that  Xki  £  Af(xt).  Newton  steps  in  Af(x*)  verify 

iK+iii<Kn. 


which,  by  (53)  and  (54),  imply 


sk,  +  l 


II 


<  Sk,  + 1  ) 


and  consequently 


«*i+i 


and  kj  +  1  =  kj+\.  By  induction,  we  establish  that 


%•+!> 


Sk 


s 


N 

k 


holds  for  all  sufficiently  large  k ,  say  k  >  k'.  Consequently  the  sequence  {xk,k  >  &'}  generated  by  the  local 
version  of  the  General  Trust- Region  Algorithm  is  <j-quadratically  convergent  to  a:*.  □ 

9.  Summary  and  Concluding  Remarks 

A  very  successful  trust-region  algorithm  for  approximating  the  solution  of  the  square  nonlinear  system  of 
equations  F(x)  —  0  is  the  well-known  Levenberg-Marquardt  trust-region  algorithm.  The  model  trust-region 
problem  in  the  Levenberg-Marquardt  algorithm  has  the  form 

minimize  ||F(ai)  +  F'(ai)s||2  (55a) 

subject  to  ||s||2  <  6.  (55b) 

where  ||  ||2  denotes  the  £2-norm  on  IRn. 

Recently  Duff,  Nocedal  and  Reid  [7]  suggested  a  trust-region  algorithm  where  the  Levenberg-Marquardt 
model  trust-region  problem  (55)  is  replaced  with  the  model  trust-region  problem 

minimize  ||F(a:)  +  F'(a;)s||1  (56a) 

subject  to  fl^lloo  <  (56b) 

In  (56a)  ||  ||i  denotes  the  ^i-norm  on  IRn  and  in  (56b)  ||  Hoc  denotes  the  ia 0  norm  on  IR".  The  subproblem 
(56)  can  be  solved  using  linear  programming  techniques  and  allows  one  to  take  advantage  of  sparsity  in  F'(x). 
Duff,  Nocedal  and  Reid  [7]  gave  no  convergence  analysis,  but  included  convincing  numerical  experimentation. 

Motivated  by  the  work  of  Duff,  Nocedal  and  Reid,  in  this  paper  we  have  presented  a  General  Trust- Region 
Algorithm  where  the  model  trust-region  problem  has  the  form 

minimize  ||/7'(k)  +  T'(®)s||a  (75a) 

subject  to  ||s||t  <  6.  (75b) 

where  ||  ||a  and  ||  ||i  are  arbitrary  but  fixed  norms  on  IR".  Levenberg-Marquardt  and  Duff-Nocedal-Reid  are 
special  cases  of  our  General  Trust-Region  Algorithm. 

Using  the  tools  from  convex  analysis,  nonsmooth  optimization  and  the  Zangwill  convergence  theorem 
we  have  established  an  effective  global  convergence  theory  for  our  General  Trust-Region  Algorithm.  The 
specialization  of  our  theory  to  the  case  when  ||  ||a  =  ||  ||t  =  ||  ||2,  be.,  Levenberg-Marquardt  gives  a  global 
convergence  theorem  which  is  competitive  with  the  standard  result. 

Our  global  convergence  theory  indicates  that  the  choice  Duff,  Nocedal  and  Reid  made  for  the  descent 
condition  (criterion  for  accepting  the  solution  of  the  model  trust-region  problem)  can  be  improved  and  we 
suggest  alternative  choices.  Using  these  choices  our  global  theory  applies  to  the  algorithm  suggested  by  Duff, 
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Nocedal  and  Reid.  Moreover,  using  the  new  Eisenstat.  and  Walker  theory,  we  have  been  able  to  show  that 
the  iteration  sequence  actually  converges  to  a  solution  of  the  nonlinear  system. 

It  is  satisfying  to  us  that  we  have  been  able  to  demonstrate  that  our  General  Trust-Region  Algorithm 
reduces  to  Newton’s  method  after  a  finite  number  of  steps  and  consequently  the  convergence  of  the  algorithm 
is  (/-quadratic. 

It  is  also  satisfying  that  we  have  been  able  to  demonstrate,  for  the  General  Trust- Region  Algorithm,  an 
analog  of  the  well-known  result  that  the  solution  of  the  Levenberg-Marquardt  model  trust-region  problem 
approaches  a  steepest  descent  direction  as  the  trust-region  radius  approaches  zero. 

While  we  have  stated  our  algorithm  for  functions  of  the  form  /  =  ||F||,  a  significant  amount  of  our 
formulation  and  theory  applies  to  more  general  functions,  e.g.,  regular  or  locally  Lipschitz  /. 
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